Matching for Run-Length Encoded Strings

نویسندگان

  • Alberto Apostolico
  • Gad M. Landau
  • Steven Skiena
چکیده

1 Motivation Measuring the similarity between two strings, through such standard measures as Hamming distance, edit distance, and longest common subsequence, is one of the fundamental problems in pattern matching. We consider the problem of nding the longest common subsequence of two strings. A well-known dynamic programming algorithm computes the longest common subsequence of strings X and Y in O(jXj jY j) time. In this paper, we develop signiicantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems. A string S is run-length encoded if it is described as an ordered sequence of pairs (; i), each consisting of an alphabet symbol and an integer i. Each pair corresponds to a run in S consisting of i consecutive occurrences of. For example, the string aaaabbbbcccabbbbcc can be encoded as a 4 b 4 c 3 a 1 b 4 c 2. Such a run-length encoded string can be signiicantly shorter than the expanded string representation. Indeed, run-length coding serves as a popular image compression technique, since many classes of images, such as binary images in facsimile transmission, typically contain large patches of identically-valued pixels. The need to approximately match run-length encoded strings emerged during development of an optical character recognition (OCR) system. This system, built in association with Data Capture Systems Inc. 8], has been designed to achieve a low substitution error-rate via xed-font character recognition. The ith row or column of pixels in a given query character image will deene a binary string containing a small number of white-black transitions. By comparing this run-length encoded string against the ith row or column of each of the character image-models, we can identify

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Edit distance of run-length encoded strings

Let X and Y be two run-length encoded strings, of encoded lengths k and l, respectively. We present a simple O(|X|l+|Y |k) time algorithm that computes their edit distance.  2002 Elsevier Science B.V. All rights reserved.

متن کامل

Parallel processing of encoded bit strings

.any operations on strings of length n can be speeded up by a factor of p using p processors. String operations can also be speeded up, even when a single processor is used, by compactly encoding the strings, e.g. using run length code. This paper shows how to combine tnese two approaches by using p processors to process compactly encoded strings. DTIC ' SELECTE JAN 2 8 85_ O

متن کامل

Parameterized Searching with Mismatches for Run-Length Encoded Strings - (Extended Abstract)

Parameterized matching between two strings occurs when it is possible to reduce the first one to the second by a renaming of the alphabet symbols. We present an algorithm for searching for parameterized occurrences of a patten in a textstring when both are given in run-length encoded form. The proposed method extends to alphabets of arbitrary yet constant size with O (|rp| × |rt|) time bounds, ...

متن کامل

Sequence Alignment Algorithms for Run-Length-Encoded Strings

A unified framework is applied to solving various sequence comparison problems for run-length encoded strings. All of these algorithms take O(min{mn′,m′n}) time and O(max{m,n}) space, for two strings of lengths m and n, with m′ and n′ runs, respectively. We assume the linear-gap model and make no assumption on the scoring matrices, which maximizes the applicability of these algorithms. The trac...

متن کامل

Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Complexity

دوره 15  شماره 

صفحات  -

تاریخ انتشار 1999